Pesquisa | Portal Regional da BVS

Detecting goals of care conversations in clinical notes with active learning.

Weissenbacher, Davy; Courtright, Katherine; Rawal, Siddharth; Crane-Droesch, Andrew; O'Connor, Karen; Kuhl, Nicholas; Merlino, Corinne; Foxwell, Anessa; Haines, Lindsay; Puhl, Joseph; Gonzalez-Hernandez, Graciela.

J Biomed Inform ; 151: 104618, 2024 03.

Artigo em Inglês | MEDLINE | ID: mdl-38431151

RESUMO

OBJECTIVE: Goals of care (GOC) discussions are an increasingly used quality metric in serious illness care and research. Wide variation in documentation practices within the Electronic Health Record (EHR) presents challenges for reliable measurement of GOC discussions. Novel natural language processing approaches are needed to capture GOC discussions documented in real-world samples of seriously ill hospitalized patients' EHR notes, a corpus with a very low event prevalence. METHODS: To automatically detect sentences documenting GOC discussions outside of dedicated GOC note types, we proposed an ensemble of classifiers aggregating the predictions of rule-based, feature-based, and three transformers-based classifiers. We trained our classifier on 600 manually annotated EHR notes among patients with serious illnesses. Our corpus exhibited an extremely imbalanced ratio between sentences discussing GOC and sentences that do not. This ratio challenges standard supervision methods to train a classifier. Therefore, we trained our classifier with active learning. RESULTS: Using active learning, we reduced the annotation cost to fine-tune our ensemble by 70% while improving its performance in our test set of 176 EHR notes, with 0.557 F1-score for sentence classification and 0.629 for note classification. CONCLUSION: When classifying notes, with a true positive rate of 72% (13/18) and false positive rate of 8% (13/158), our performance may be sufficient for deploying our classifier in the EHR to facilitate bedside clinicians' access to GOC conversations documented outside of dedicated notes types, without overburdening clinicians with false positives. Improvements are needed before using it to enrich trial populations or as an outcome measure.

Assuntos

Comunicação , Documentação , Humanos , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Planejamento de Assistência ao Paciente

PhenoID, a language model normalizer of physical examinations from genetics clinical notes.

Weissenbacher, Davy; Rawal, Siddharth; Zhao, Xinwei; Priestley, Jessica R C; Szigety, Katherine M; Schmidt, Sarah F; Higgins, Mary J; Magge, Arjun; O'Connor, Karen; Gonzalez-Hernandez, Graciela; Campbell, Ian M.

medRxiv ; 2024 Jan 03.

Artigo em Inglês | MEDLINE | ID: mdl-37904943

RESUMO

Background: Phenotypes identified during dysmorphology physical examinations are critical to genetic diagnosis and nearly universally documented as free-text in the electronic health record (EHR). Variation in how phenotypes are recorded in free-text makes large-scale computational analysis extremely challenging. Existing natural language processing (NLP) approaches to address phenotype extraction are trained largely on the biomedical literature or on case vignettes rather than actual EHR data. Methods: We implemented a tailored system at the Children's Hospital of Philadelpia that allows clinicians to document dysmorphology physical exam findings. From the underlying data, we manually annotated a corpus of 3136 organ system observations using the Human Phenotype Ontology (HPO). We provide this corpus publicly. We trained a transformer based NLP system to identify HPO terms from exam observations. The pipeline includes an extractor, which identifies tokens in the sentence expected to contain an HPO term, and a normalizer, which uses those tokens together with the original observation to determine the specific term mentioned. Findings: We find that our labeler and normalizer NLP pipeline, which we call PhenoID, achieves state-of-the-art performance for the dysmorphology physical exam phenotype extraction task. PhenoID's performance on the test set was 0.717, compared to the nearest baseline system (Pheno-Tagger) performance of 0.633. An analysis of our system's normalization errors shows possible imperfections in the HPO terminology itself but also reveals a lack of semantic understanding by our transformer models. Interpretation: Transformers-based NLP models are a promising approach to genetic phenotype extraction and, with recent development of larger pre-trained causal language models, may improve semantic understanding in the future. We believe our results also have direct applicability to more general extraction of medical signs and symptoms. Funding: US National Institutes of Health.

Automatic Extraction of Medication Mentions from Tweets-Overview of the BioCreative VII Shared Task 3 Competition.

Weissenbacher, Davy; O'Connor, Karen; Rawal, Siddharth; Zhang, Yu; Tsai, Richard Tzong-Han; Miller, Timothy; Xu, Dongfang; Anderson, Carol; Liu, Bo; Han, Qing; Zhang, Jinfeng; Kulev, Igor; Köprü, Berkay; Rodriguez-Esteban, Raul; Ozkirimli, Elif; Ayach, Ammer; Roller, Roland; Piccolo, Stephen; Han, Peijin; Vydiswaran, V G Vinod; Tekumalla, Ramya; Banda, Juan M; Bagherzadeh, Parsa; Bergler, Sabine; Silva, João F; Almeida, Tiago; Martinez, Paloma; Rivera-Zavala, Renzo; Wang, Chen-Kai; Dai, Hong-Jie; Alberto Robles Hernandez, Luis; Gonzalez-Hernandez, Graciela.

Database (Oxford) ; 20232023 02 03.

Artigo em Inglês | MEDLINE | ID: mdl-36734300

RESUMO

This study presents the outcomes of the shared task competition BioCreative VII (Task 3) focusing on the extraction of medication names from a Twitter user's publicly available tweets (the user's 'timeline'). In general, detecting health-related tweets is notoriously challenging for natural language processing tools. The main challenge, aside from the informality of the language used, is that people tweet about any and all topics, and most of their tweets are not related to health. Thus, finding those tweets in a user's timeline that mention specific health-related concepts such as medications requires addressing extreme imbalance. Task 3 called for detecting tweets in a user's timeline that mentions a medication name and, for each detected mention, extracting its span. The organizers made available a corpus consisting of 182 049 tweets publicly posted by 212 Twitter users with all medication mentions manually annotated. The corpus exhibits the natural distribution of positive tweets, with only 442 tweets (0.2%) mentioning a medication. This task was an opportunity for participants to evaluate methods that are robust to class imbalance beyond the simple lexical match. A total of 65 teams registered, and 16 teams submitted a system run. This study summarizes the corpus created by the organizers and the approaches taken by the participating teams for this challenge. The corpus is freely available at https://biocreative.bioinformatics.udel.edu/tasks/biocreative-vii/track-3/. The methods and the results of the competing systems are analyzed with a focus on the approaches taken for learning from class-imbalanced data.

Assuntos

Mineração de Dados , Processamento de Linguagem Natural , Humanos , Mineração de Dados/métodos

Automatic Cohort Determination from Twitter for HIV Prevention amongst Black and Hispanic Men.

Weissenbacher, Davy; Flores, J Ivan; Wang, Yunwen; O'Connor, Karen; Rawal, Siddharth; Stevens, Robin; Gonzalez-Hernandez, Graciela.

AMIA Jt Summits Transl Sci Proc ; 2022: 504-513, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35854738

RESUMO

Recruiting people from diverse backgrounds to participate in health research requires intentional and culture-driven strategic efforts. In this study, we utilize publicly available Twitter posts to identify targeted populations to recruit for our HIV prevention study. Natural language processing and machine learning classification methods were used to find self-declarations of ethnicity, gender, age group, and sexually-explicit language. Using the official Twitter API we collected 47.4 million tweets posted over 8 months from two areas geo-centered around Los Angeles. Using available tools (Demographer and M3), we identified the age and race of 5,392 users as likely young Black or Hispanic men living in Los Angeles. We then collected and analyzed their timelines to automatically find sex-related tweets, yielding 2,166 users. Despite a limited precision, our results suggest that it is possible to automatically identify users based on their demographic attributes and Twitter language characteristics for enrollment into epidemiological studies.

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA